lift value
Nearly-Linear Time Private Hypothesis Selection with the Optimal Approximation Factor
Aliakbarpour, Maryam, Shi, Zhan, Stevens, Ria, Wang, Vincent X.
Estimating the density of a distribution from its samples is a fundamental problem in statistics. Hypothesis selection addresses the setting where, in addition to a sample set, we are given $n$ candidate distributions -- referred to as hypotheses -- and the goal is to determine which one best describes the underlying data distribution. This problem is known to be solvable very efficiently, requiring roughly $O(\log n)$ samples and running in $\tilde{O}(n)$ time. The quality of the output is measured via the total variation distance to the unknown distribution, and the approximation factor of the algorithm determines how large this distance is compared to the optimal distance achieved by the best candidate hypothesis. It is known that $ฮฑ= 3$ is the optimal approximation factor for this problem. We study hypothesis selection under the constraint of differential privacy. We propose a differentially private algorithm in the central model that runs in nearly-linear time with respect to the number of hypotheses, achieves the optimal approximation factor, and incurs only a modest increase in sample complexity, which remains polylogarithmic in $n$. This resolves an open question posed by [Bun, Kamath, Steinke, Wu, NeurIPS 2019]. Prior to our work, existing upper bounds required quadratic time.
Towards Multi-Stakeholder Evaluation of ML Models: A Crowdsourcing Study on Metric Preferences in Job-matching System
While machine learning (ML) technology affects diverse stakeholders, there is no one-size-fits-all metric to evaluate the quality of outputs, including performance and fairness. Using predetermined metrics without soliciting stakeholder opinions is problematic because it leads to an unfair disregard for stakeholders in the ML pipeline. In this study, to establish practical ways to incorporate diverse stakeholder opinions into the selection of metrics for ML, we investigate participants' preferences for different metrics by using crowdsourcing. We ask 837 participants to choose a better model from two hypothetical ML models in a hypothetical job-matching system twenty times and calculate their utility values for seven metrics. To examine the participants' feedback in detail, we divide them into five clusters based on their utility values and analyze the tendencies of each cluster, including their preferences for metrics and common attributes. Based on the results, we discuss the points that should be considered when selecting appropriate metrics and evaluating ML models with multiple stakeholders.
MeshDQN: A Deep Reinforcement Learning Framework for Improving Meshes in Computational Fluid Dynamics
Lorsung, Cooper, Farimani, Amir Barati
Meshing is a critical, but user-intensive process necessary for stable and accurate simulations in computational fluid dynamics (CFD). Mesh generation is often a bottleneck in CFD pipelines. Adaptive meshing techniques allow the mesh to be updated automatically to produce an accurate solution for the problem at hand. Existing classical techniques for adaptive meshing require either additional functionality out of solvers, many training simulations, or both. Current machine learning techniques often require substantial computational cost for training data generation, and are restricted in scope to the training data flow regime. MeshDQN is developed as a general purpose deep reinforcement learning framework to iteratively coarsen meshes while preserving target property calculation. A graph neural network based deep Q network is used to select mesh vertices for removal and solution interpolation is used to bypass expensive simulations at each step in the improvement process. MeshDQN requires a single simulation prior to mesh coarsening, while making no assumptions about flow regime, mesh type, or solver, only requiring the ability to modify meshes directly in a CFD pipeline. MeshDQN successfully improves meshes for two 2D airfoils.
Patterns of near-crash events in a naturalistic driving dataset: applying rules mining
Kong, Xiaoqiang, Das, Subasish, Zhou, Hongmin, Zhang, Yunlong
The estimated economic cost of all fatalities due to traffic crashes in 2018 was approximately $55 billion in the United States (CDC, 2020). Such a huge cost warrants continued investigation into the contributing factors of crash fatalities and the implementation of effective countermeasures for improving traffic safety. Traditional safety studies have generally focused on identifying correlations between crashes and roadway features. Due to a lack of substantial driving behavior information in conventional historical crash datasets, these studies can seldom identify driving behaviors that contribute to crashes. Moreover, traditional studies require crash data spanning an extended period of time.
A Deep Belief Network Based Machine Learning System for Risky Host Detection
Feng, Wangyan, Wu, Shuning, Li, Xiaodan, Kunkle, Kevin
To assure cyber security of an enterprise, typically SIEM (Security Information and Event Management) system is in place to normalize security event from different preventive technologies and flag alerts. Analysts in the security operation center (SOC) investigate the alerts to decide if it is truly malicious or not. However, generally the number of alerts is overwhelming with majority of them being false positive and exceeding the SOC's capacity to handle all alerts. There is a great need to reduce the false positive rate as much as possible. While most previous research focused on network intrusion detection, we focus on risk detection and propose an intelligent Deep Belief Network machine learning system. The system leverages alert information, various security logs and analysts' investigation results in a real enterprise environment to flag hosts that have high likelihood of being compromised. Text mining and graph based method are used to generate targets and create features for machine learning. In the experiment, Deep Belief Network is compared with other machine learning algorithms, including multi-layer neural network, random forest, support vector machine and logistic regression. Results on real enterprise data indicate that the deep belief network machine learning system performs better than other algorithms for our problem and is six times more effective than current rule-based system. We also implement the whole system from data collection, label creation, feature engineering to host score generation in a real enterprise production environment.
Association Rules and the Apriori Algorithm: A Tutorial
When we go grocery shopping, we often have a standard list of things to buy. Each shopper has a distinctive list, depending on one's needs and preferences. A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy beer and chips. Understanding these buying patterns can help to increase sales in several ways. While we may know that certain items are frequently bought together, the question is, how do we uncover these associations? Besides increasing sales profits, association rules can also be used in other fields.
Association Rules and the Apriori Algorithm: A Tutorial
When we go grocery shopping, we often have a standard list of things to buy. Each shopper has a distinctive list, depending on one's needs and preferences. A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy beer and chips. Understanding these buying patterns can help to increase sales in several ways. While we may know that certain items are frequently bought together, the question is, how do we uncover these associations? Besides increasing sales profits, association rules can also be used in other fields.
Association Rules and the Apriori Algorithm
When we go grocery shopping, we often have a standard list of things to buy. Each shopper has a distinctive list, depending on one's needs and preferences. A housewife might buy healthy ingredients for a family dinner, while a bachelor might buy beer and chips. Understanding these buying patterns can help to increase sales in several ways. While we may know that certain items are frequently bought together, the question is, how do we uncover these associations?